Heavy Tails of Ols
نویسندگان
چکیده
Suppose the tails of the noise distribution in a regression exhibit power law behavior. Then the distribution of the OLS regression estimator inherits this tail behavior. This is relevant for regressions involving nancial data. We derive explicit nite sample expressions for the tail probabilities of the distribution of the OLS estimator. These are useful for inference. Simulations for medium sized samples reveal considerable deviations of the coe¢ cient estimates from their true values, in line with our theoretical formulas. The formulas provide a benchmark for judging the observed highly variable cross country estimates of the expectations coe¢ cient in yield curve regressions. JEL Codes: C13, C16, C20 1. Motivation Regression coe¢ cients based on nancial data are often found to vary considerably across different samples. This observation pertains to nance models like the CAPM beta regression, the forward premium equation and the yield curve regression. In economics, macro models like the monetary model of the foreign exchange rate also yield a wide spectrum of regression coe¢ cients. The uncertainty in CAPM regressions was reviewed in Campbell et al. (1997, Ch. 5) and Cochrane (2001, Ch. 15). Lettau and Ludvigson (2001) explicitly model the time variation in beta. Hodrick (1987) and Lewis (1995) report wildly di¤erent estimates for the Fisher coe¢ cient in forward premium regressions. Moreover, typical estimates of the expectation coe¢ cient in yield curve regressions reported by Fama (1976), Mankiw and Miron (1986), Campbell and Shiller (1991) show substantial variation over time and appear to be downward biased; Campbell et al. (1997, Ch. 10.2) provide a lucid review. The coe¢ cient of the relative money supply in the regression of the exchange rate on the variables of the monetary model of the foreign exchange rate, varies considerably around its theoretical unitary value; see for example Frenkel (1993, Ch. 4). In monetary economics parameter uncertainty is sometimes explicitly taken into account when it comes to policy decisions; see Brainard (1967) and, more recently, Sack (2000). In a random coe¢ cient model, see for example Feige and Swamy (1974), the (regression) coe¢ cients themselves are subject to randomness and therefore uctuate about some xed values. Estimation of the random coe¢ cient models is reviewed in Chow (1984). Moreover, strategic decisions and rapidly changing business environments imply that one often works with a relatively short data window. Similarly, as soon as some macro variables are part of the regression, one is compelled to use low frequency data and hence a small or medium sized sample. Date : November 25, 2011. 1 Thomas Mikoschs research is partly supported by the Danish Natural Science Research Council (FNU) Grants 09-072331 "Point process modelling and statistical inference" and 10-084172 Heavy tail phenomena: Modeling and estimation. Both authors are grateful to the stimulating research environments provided at EURANDOM in Eindhoven (The Netherlands) and at the Newton Institute in Cambridge (UK) in July 2001 during the thematic period on Managing Risk and Uncertainty, where early parts of this paper were written. We like to thank Martijn van Harten and Olaf van Veen for their valuable research assistance; the comments and suggestions by the referees, associate editor and Catalin Starica were very stimulating for the revision of the paper. 2 Corresponding author. 1 2 T. MIKOSCH AND C. G. DE VRIES A possible reason for the considerable variation in estimated regression coe¢ cients across different medium sized samples is the heavy tailed nature of the innovations distribution. It is an acknowledged empirical fact that many nancial variables are much better modeled by distributions that have tails thicker than the normal distribution; see e.g. Embrechts et al. (1997, Ch. 6), Campbell et al. (1997), or Mikosch (2003) and the references therein. In small and medium sized samples the central limit theory (CLT) based standard p n-rates of convergence for the OLS parameter estimators can be a poor guide to the parameter range that may occur. Small sample results for the distribution of regression estimators are rare and exact results are di¢ cult to obtain. In this paper we derive explicit expressions for the tail of the distribution of the regression estimators in nite samples for the case that the noise distribution exhibits heavy tails.1 Consider the simple regression model: Yt = Xt + 't : Suppose that the i.i.d. additive noise 't has a distribution with Pareto-like tails, i.e. P (j'j > s) ' cs for high quantiles s, some constant c > 0 and tail index > 0. For example, the Student-t distribution with degrees of freedom ts this assumption. The ordinary least squares estimator of reads b = Pnt=1XtYt Pn t=1X 2 t = + n ; where n := t=1'tXt t=1X 2 t : We show that under some mild conditions, for example if the Xt are i.i.d. with a standard uniform distribution and if the 't follow a Student-t distribution with degrees of freedom, then P ( n > x) = P ( n x) nE X1 Pn s=1X 2 s P (' > x): Note that this is a xed nite sample size n cum large deviation x result. This relation shows very clearly that for xed n and large x there is a strong deviation of the tail probability of n from a normal based tail which, for xed x and large n, would be prescribed by the CLT. In particular, the resulting Pareto-like tails of n yield a possible explanation for the empirical observation that regression estimates often uctuate wildly around their theoretical values. We also emphasize that the above Pareto-like tail probabilities can be used for statements about very high quantiles of n. Suppose q > x is an even higher quantile, possibly at the border or even outside the range of the data. Then q ' x P ( n > x) P ( n > q) 1= : Below we demonstrate that in small and medium sized samples with heavy tail distributed innovations, this approximation is considerably better than the normal (CLT) based approach. It can be used to gauge the probability of observing regression coe¢ cients of unusual size. The results hinge on relatively weak assumptions regarding the stochastic nature of the explanatory variable. For the linear model above we require that the joint density of the explanatory variables is bounded in some neighborhood of the origin. A restriction is the condition that the regressor be exogenous; but the regressor is not assumed to be xed. In addition to the case of additive noise ', we also investigate the case of random coe¢ cients , i.e. the case with multiplicative noise. Moreover, we allow for the possibility that the multiplicative noise component is correlated with the additive noise term, and in this sense there can be correlation between the economic explanatory part and the additive noise structure. Both the noise and the regressor are allowed to be time dependent. The time dependency has an extra e¤ect on the dispersion of the regression coe¢ cients. The paper does not propose alternative regression procedures such as the estimator studied in Blattberg and Sargent (1971) or the Least Absolute Deviations estimator, Rank estimators 1In large samples, given that the innovations have nite variance, CLT based results would apply. HEAVY TAILS OF OLS 3 investigated in this issue by Hallin, Swan, Vandebout and Veredas (2011), tail trimming for GMM estimation studied by Hill and Renault (2010), the partially adaptive methods proposed in Butler, McDonald, Nelson and White (1990), or the maximum likelihood procedure for the case of residuals that follow in nite variance stable distributions as considered in Nolan and Revah (2011); nor do we venture into the issue of model identi cation of in nite variance autoregressive processes as investigated in Andrews and Davis (2011). The purpose of our paper is di¤erent. We investigate the shape of the distribution of regression coe¢ cients when the standard OLS procedure is applied in the case that the innovations are heavy tailed distributed. Thus while the alternative estimators are meant to overcome the de ciencies of the OLS procedure in the presence of heavy tails, we quantitatively describe the properties of the OLS procedure in this situation. The OLS method is very widely applied, including the case of nancial data which are known to exhibit heavy tails. Therefore it is of interest to understand the OLS results under non-standard conditions. The theoretical results are rst illustrated by means of a simulation experiment. The Monte Carlo study demonstrates that in medium sized samples the estimated coe¢ cients can deviate considerably from their true values. The expressions for the tail probabilities are shown to work as anticipated and their use for inference is demonstrated. Subsequently, we investigate the relevance of the theory for the wide dispersion of the expectation hypothesis coe¢ cients in yield curve regressions. 2. The Model We study the regression model: (2.1) Yt = ( + "t)Xt + 't ; where ("t; 't) is a strictly stationary noise sequence of 2-dimensional random vectors, and (Xt) is a sequence of explanatory variables, independent of the noise. In what follows, we write ', ", etc., for generic elements of the strictly stationary sequence s ('t), ("t), etc. The coe¢ cient is a xed parameter to be estimated by regression. The model (2.1) comprises a large variety of di¤erent economic models since it allows for both additive and multiplicative uncertainty. If the noises "t and 't have zero mean, then, conditionally on the information at time t 1, the model (2.1) captures the structure of many of the rational expectations nance models such as the CAPM. In what follows, we assume that the right tail of the marginal distributions F"(x) and F'(x) of " and ', respectively, is regularly varying with index > 0. This means that the limits (2.2) lim x!1 1 F (xs) 1 F (x) = s for all s > 0 exist for F 2 fF"; F'g. Regular variation entails that ( + )-th moments of F are in nite for > 0, supporting the intuition on the notion of heavy tailed distribution. Some prominent members of the class of distributions with regularly varying tails are the Student-t, F -, Fréchet, in nite variance stable and Pareto distributions. First order approximations to the tails of these distribution functions F are comparable to the tail c x of a Pareto distribution for some c; > 0, i.e., lim x!1 1 F (x) c x = 1 : The power like decay in the right tail area implies the lack of moments higher than . There are other distributions which have fatter tails than the normal distribution, such as the exponential or lognormal distributions. But these distributions possess all power moments. They are less suitable for capturing the very large positive and highly negative values observed in nancial data sets. Independent positive random variables A1; : : : ; An with regularly varying right tails (possibly with di¤erent indices) satisfy a well known additivity property of their convolutions; see for example 4 T. MIKOSCH AND C. G. DE VRIES Feller (1971). This means that (2.3) lim x!1 Pn i=1 P (Ai > x) P ( Pn i=1Ai > x) = 1 : This is a useful fact when it comes to evaluating the distribution al tail of (weighted) sums of random variables with regularly varying tails. The ordinary least squares (OLS) estimator of is comprised of such sums, but also involves products and ratios of random variables. In particular, the OLS estimator b of in model (2.1), given by (2.4) b = Pnt=1XtYt Pn t=1X 2 t = + n;" + n;' ; and involves the terms (2.5) n;" := t=1"tX 2 t t=1X 2 t and n;' := t=1'tXt t=1X 2 t : Thus, in the case of xed regressors Xt and with noise ("t; 't) whose components have distributions with regularly varying tails, one can rely on the additivity property (2.3). But if the regressors are stochastic, we face a more complicated problem for which we derive new results. This paper aims at investigating the nite sample variability of the regression coe¢ cient estimator in models with additive noise and random coe¢ cients when the noise comes from a heavy tailed distribution. In Section 3 we derive the nite sample tail properties of the distribution of the OLS estimator of in model (2.1) when the noise has a distribution with regularly varying tails; see (2.2). The simulation study in Section 4 conveys the relevance of the theory. Section 5 applies the theory to the distribution of the expectations coe¢ cient in yield curve estimation. Some proofs are relegated to the Appendix. 3. Theory In this section we derive the nite sample tail properties of the distribution of the OLS regression coe¢ cient estimator in the model (2.1) when the noise distribution has regularly varying tails. To this end we rst recall in Section 3.1.1 the de nitions of regular and slow variation as well as the basic scaling property for convolutions of random variables with regularly varying distributions. Subsequently, we obtain the regular variation properties for inner products of those vectors of random variables that appear in the OLS estimator of . The joint distribution of these inner products is multivariate regularly varying. In Section 3.1.2 we give conditions for the niteness of moments of quotients of random variables. Finally, we derive the asymptotic tail behavior of the distribution of the OLS estimator of by combining the previous results. We present the main results on the nite sample tail behavior of b for i.i.d. regularly varying noise (Section 3.2.1), for regularly varying linearly dependent noise (Section 3.2.2) and give some comments on the case of general regularly varying noise (Section 3.2.3). 3.1. Preliminaries. 3.1.1. Regular variation. A positive measurable function L on [0;1) is said to be slowly varying if lim x!1 L(cx) L(x) = 1 for all c > 0: The function g(x) = x L(x) for some 2 R is then said to be regularly varying with index . We say that the random variable X and its distribution F (we use the same symbol F for its distribution function) are regularly varying with (tail) index 0 if there exist p; q 0 with p+ q = 1 and a slowly varying function L such that (3.1) F ( x) = q x L(x) (1 + o(1)) and F (x) := 1 F (x) = p x L(x) (1 + o(1)) ; x!1 : HEAVY TAILS OF OLS 5 Condition (3.1) is usually referred to as a tail balance condition. For an encyclopedic treatment of regular variation, see Bingham et al. (1987). In what follows, a(x) b(x) for positive functions a and b means that a(x)=b(x)! 1, usually as x ! 1. We start with an auxiliary result which is a slight extension of Lemma 2.1 in Davis and Resnick (1996) where this result was proved for non-negative random variables. The proof in the general case is analogous and therefore omitted. Lemma 3.1. Let G be a distribution function concentrated on (0;1) satisfying (3.1). Assume Z1; : : : ; Zn are random variables such that (3.2) lim x!1 P (Zi > x) G(x) = ci and lim x!1 P (Zi x) G(x) = c i ; i = 1; : : : ; n ; for some non-negative numbers c i and lim x!1 P (jZij > x ; jZj j > x) G(x) = 0 ; i 6= j : Then lim x!1 P (Z1 + + Zn > x) G(x) = c1 + + cn and lim x!1 P (Z1 + + Zn x) G(x) = c 1 + + c n : The following result is a consequence of this lemma. Lemma 3.2. Suppose Zi are regularly varying random variables with tail index i > 0, i = 1; : : : ; n. Assume that one of the following conditions holds. (1) The Zis are independent and satisfy (3.2) with G(x) = P (jZ1j > x), x > 0. (2) The Zis are non-negative and independent. (3) Z1 and Z2 are regularly varying with indices 0 < 1 < 2 and the parameters p1; q1 in the tail balance condition (3.1) for the distribution of Z1 are positive. Then under (1) or (2) the relations P (Z1 + + Zn > x) P (Z1 > x) + + P (Zn > x) ; P (Z1 + + Zn x) P (Z1 x) + + P (Zn x) hold as x!1. If condition (3) applies, as x!1, P (Z1 + Z2 > x) P (Z1 > x) and P (Z1 + Z2 x) P (Z1 x) The proof is given in the Appendix. Recall the de nition of a regularly varying random vector X with values in Rd; see for example de Haan and Resnick (1977), Resnick (1986,1987). In what follows, Sd 1 denotes the unit sphere in Rd with respect to a (given) norm j j and v ! refers to vague convergence on the Borel - eld of Sd 1; see Resnick (1986,1987) for details. De nition 3.3. The random vector X with values in Rd and its distribution are said to be regularly varying with index and spectral measure P if there exists a random vector with values in Sd 1 and distribution P such that the following limit exists for all t > 0: (3.3) P (jXj > tx ;X=jXj 2 ) P (jXj > x) v ! t P ( ) ; x!1 : 6 T. MIKOSCH AND C. G. DE VRIES The vague convergence in (3.3) means that P (jXj > tx ;X= j Xj 2 S) P (jXj > x) ! t P (S) ; for all Borel sets S Sd 1 such that P (@(S)) = 0, where @(S) denotes the boundary of S. Alternatively, (3.3) is equivalent to the totality of the relations P (X 2 xA) P (jXj > x) ! (A) : Here is a non-null measure on the Borel - eld of Rnf0g with property (tA) = t (A), t > 0, for any Borel set A Rnf0g, bounded away from zero and such that (@(A)) = 0. We have the following result. Lemma 3.4. Assume that X = (X1; :::; Xd) is regularly varying in Rd with index > 0 and is independent of the random vector Y = (Y1; :::; Yd) which satis es EjYj + < 1 for some > 0. Then the scalar product Z = X0Y is regularly varying with index : Moreover, if X has independent components, then as x!1, P (Z > x) P (jXj > x) " d X i=1 ci E[Y i IfYi>0g] + d X i=1 c i E[jYij IfYi<0g] # ;
منابع مشابه
Linear Models with Outliers: Choosing between Conditional- Mean and Conditional- Median Methods
State politics researchers commonly employ ordinary least squares (OLS) regression or one of its variants to test linear hypotheses. However, OLS is easily influenced by outliers and thus can produce misleading results when the error term distribution has heavy tails. Here we demonstrate that median regression (MR), an alternative to OLS that conditions the median of the dependent variable (rat...
متن کاملA Comparison of LAD and OLS Regression for Effort Prediction of Software Projects
Accurate effort prediction of software projects is of concern to portfolio managers, customers, vendors as well as project managers. Ordinary least squares (OLS) regression is widely used to create software prediction models, and it seems to perform just as well or better than most other, non-regression, prediction models. Software data sets may however exhibit certain characteristics that do n...
متن کاملParametric Models for Biomarkers Based on Flexible Size Distributions
Recent advances in social science surveys include collection of biological samples. Although biomarkers offer a large potential for social science and economic research, they impose a number of statistical challenges, often being distributed asymmetrically with heavy tails. Using data from the UK Household Panel Survey (UKHLS), we illustrate the comparative performance of a set of flexible para...
متن کاملUnderstanding Heavy Tails in a Bounded World Or, Is a Truncated Heavy Tail Heavy or Not?
We address the important question of the extent to which random variables and vectors with truncated power tails retain the characteristic features of random variables and vectors with power tails. We define two truncation regimes, soft truncation regime and hard truncation regime, and show that, in the soft truncation regime, truncated power tails behave, in important respects, as if no trunca...
متن کاملVariable Heavy Tailed Durations in Internet Traffic, Part I: Understanding Heavy Tails
This paper is part of a larger paper that studies tails of the duration distribution of Internet data flows, and their “heaviness”. Data analysis motivates the concepts of moderate, far and extreme tails for understanding the richness of information available in the data. The analysis also motivates a notion of “variable tail index”, which leads to a generalization of existing theory for heavy ...
متن کامل